ggbetweenstats(
data = iris,
x = Species,
y = Sepal.Length,
title = "Distribution of sepal length across Iris species"
)Indrajeet Patil
Current CRAN package count >23,000
{ggstatsplot} provides-
📊 information-rich plots with statistical details
📝 suitable for faster (exploratory) data analysis and reporting
Graphical summaries can reveal problems not visible from numerical statistics.
The grammar of graphics is a powerful framework (Wilkinson, 2011) and can help you make any graphics fitting your specific data visualization needs! But…
Quality of Life (QoL) improvements with {ggstatsplot}
Provide ready-made plots with defaults following the best practices in statistical reporting and data visualization.
In a typical exploratory data analysis workflow, data visualization and statistical modeling are two different phases: visualization informs modeling, and modeling can suggest a different visualization, and so on and so forth.
Central idea of {ggstatsplot}
Simple: combine these two phases into one!
…but we will come back to that later 📌
Let’s get started first!
Package available for installation on CRAN and GitHub:
| Type | Command |
|---|---|
| Release | install.packages("ggstatsplot") |
| Development | pak::pak("IndrajeetPatil/ggstatsplot") |
ggbetweenstats()Hypothesis about group differences: independent measures design
Important
✏️ Defaults
Statistical approaches available
Standard approach
Pearson’s correlation test revealed that, across 142 participants, variable x was negatively correlated with variable y: \(t(140)=-0.76, p=.446\). The effect size \((r=-0.06, 95\% CI [-.23,.10])\) was small, as per Cohen’s (1988) conventions. The Bayes Factor for the same analysis revealed that the data were 5.81 times more probable under the null hypothesis as compared to the alternative hypothesis. This can be considered moderate evidence (Jeffreys, 1961) in favor of the null hypothesis (absence of any correlation between x and y).
{ggstatsplot} approach
Parametric
Hunting for packages
📦 for inferential statistics ({stats})
📦 computing effect size + CIs ({effectsize})
📦 for descriptive statistics ({skimr})
📦 pairwise comparisons ({multcomp})
📦 Bayesian hypothesis testing ({BayesFactor})
📦 Bayesian estimation ({bayestestR})
📦 …
Inconsistent APIs
🤔 accepts data frame, vector, matrix?
🤔 long/wide format data?
🤔 works with NAs?
🤔 returns data frame, vector, matrix?
🤔 works with tibbles?
🤔 has all necessary details?
🤔 …
“What if I don’t like the default plots?” 🤔
Things to be wary of
Promotes mindless application of statistical tests.
Easy-to-use software can lead to misuse.
{ggplot2} extension.Things that will pull you in
Each commit must pass many QA checks:
CI Checks (GitHub Actions)
Benefits of the {ggstatsplot} approach
{ggstatsplot} combines data visualization and statistical analysis in a single step.
It…
Source code for these slides can be found on GitHub.
If you are interested in good programming and software development practices, check out my other slide decks.
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os Ubuntu 22.04.5 LTS
system x86_64, linux-gnu
hostname fv-az1116-866
ui X11
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz UTC
date 2024-11-11
pandoc 3.5 @ /opt/hostedtoolcache/pandoc/3.5/x64/ (via rmarkdown)
quarto 1.6.33 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
base * 4.4.2 2024-10-31 [3] local
BayesFactor 0.9.12-4.7 2024-01-24 [1] RSPM
bayestestR 0.15.0 2024-10-17 [1] RSPM
bitops 1.0-9 2024-10-03 [1] RSPM
BWStest 0.2.3 2023-10-10 [1] RSPM
cachem 1.1.0 2024-05-16 [1] RSPM
cli 3.6.3 2024-06-21 [1] RSPM
coda 0.19-4.1 2024-01-31 [1] RSPM
colorspace 2.1-1 2024-07-26 [1] RSPM
compiler 4.4.2 2024-10-31 [3] local
correlation 0.8.6 2024-10-26 [1] RSPM
cranlogs 2.1.1 2019-04-29 [1] RSPM
curl 6.0.0 2024-11-05 [1] RSPM
data.table 1.16.2 2024-10-10 [1] RSPM
datasets * 4.4.2 2024-10-31 [3] local
datawizard 0.13.0 2024-10-05 [1] RSPM
digest 0.6.37 2024-08-19 [1] RSPM
dplyr 1.1.4 2023-11-17 [1] RSPM
effectsize 0.8.9 2024-07-03 [1] RSPM
evaluate 1.0.1 2024-10-10 [1] RSPM
fansi 1.0.6 2023-12-08 [1] RSPM
farver 2.1.2 2024-05-13 [1] RSPM
fastmap 1.2.0 2024-05-15 [1] RSPM
generics 0.1.3 2022-07-05 [1] RSPM
ggiraph 0.8.10 2024-05-17 [1] RSPM
ggiraphExtra 0.3.0 2020-10-06 [1] RSPM
ggplot2 * 3.5.1 2024-04-23 [1] RSPM
ggrepel 0.9.6 2024-09-07 [1] RSPM
ggsignif 0.6.4 2022-10-13 [1] RSPM
ggstatsplot * 0.12.5.9000 2024-11-10 [1] Github (IndrajeetPatil/ggstatsplot@b7350e9)
ggthemes 5.1.0 2024-02-10 [1] RSPM
glue 1.8.0 2024-09-30 [1] RSPM
gmp 0.7-5 2024-08-23 [1] RSPM
graphics * 4.4.2 2024-10-31 [3] local
grDevices * 4.4.2 2024-10-31 [3] local
grid 4.4.2 2024-10-31 [3] local
gtable 0.3.6 2024-10-25 [1] RSPM
htmltools 0.5.8.1 2024-04-04 [1] RSPM
htmlwidgets 1.6.4 2023-12-06 [1] RSPM
httr 1.4.7 2023-08-15 [1] RSPM
insight 0.20.5 2024-10-02 [1] RSPM
jsonlite 1.8.9 2024-09-20 [1] RSPM
knitr 1.49 2024-11-08 [1] RSPM
kSamples 1.2-10 2023-10-07 [1] RSPM
labeling 0.4.3 2023-08-29 [1] RSPM
lattice 0.22-6 2024-03-20 [3] CRAN (R 4.4.2)
lifecycle 1.0.4 2023-11-07 [1] RSPM
lubridate 1.9.3 2023-09-27 [1] RSPM
magrittr 2.0.3 2022-03-30 [1] RSPM
MASS 7.3-61 2024-06-13 [3] CRAN (R 4.4.2)
Matrix 1.7-1 2024-10-18 [3] CRAN (R 4.4.2)
MatrixModels 0.5-3 2023-11-06 [1] RSPM
memoise 2.0.1 2021-11-26 [1] RSPM
methods * 4.4.2 2024-10-31 [3] local
mgcv 1.9-1 2023-12-21 [3] CRAN (R 4.4.2)
multcompView 0.1-10 2024-03-08 [1] RSPM
munsell 0.5.1 2024-04-01 [1] RSPM
mvtnorm 1.3-2 2024-11-04 [1] RSPM
mycor 0.1.1 2018-04-10 [1] RSPM
nlme 3.1-166 2024-08-14 [3] CRAN (R 4.4.2)
packageRank * 0.9.3 2024-10-16 [1] RSPM
paletteer 1.6.0 2024-01-21 [1] RSPM
parallel 4.4.2 2024-10-31 [3] local
parameters 0.23.0 2024-10-18 [1] RSPM
patchwork 1.3.0 2024-09-16 [1] RSPM
pbapply 1.7-2 2023-06-27 [1] RSPM
performance 0.12.4 2024-10-18 [1] RSPM
pillar 1.9.0 2023-03-22 [1] RSPM
pkgconfig 2.0.3 2019-09-22 [1] RSPM
pkgsearch 3.1.3 2023-12-10 [1] RSPM
plyr 1.8.9 2023-10-02 [1] RSPM
PMCMRplus 1.9.12 2024-09-08 [1] RSPM
ppcor 1.1 2015-12-03 [1] RSPM
prismatic 1.1.2 2024-04-10 [1] RSPM
purrr 1.0.2 2023-08-10 [1] RSPM
R.methodsS3 1.8.2 2022-06-13 [1] RSPM
R.oo 1.27.0 2024-11-01 [1] RSPM
R.utils 2.12.3 2023-11-18 [1] RSPM
R6 2.5.1 2021-08-19 [1] RSPM
RColorBrewer 1.1-3 2022-04-03 [1] RSPM
Rcpp 1.0.13-1 2024-11-02 [1] RSPM
RCurl 1.98-1.16 2024-07-11 [1] RSPM
rematch2 2.1.2 2020-05-01 [1] RSPM
reshape2 1.4.4 2020-04-09 [1] RSPM
rlang 1.1.4 2024-06-04 [1] RSPM
rmarkdown 2.29 2024-11-04 [1] RSPM
Rmpfr 0.9-5 2024-01-21 [1] RSPM
scales 1.3.0 2023-11-28 [1] RSPM
sessioninfo 1.2.2.9000 2024-11-10 [1] Github (r-lib/sessioninfo@37c81af)
sjlabelled 1.2.0 2022-04-10 [1] RSPM
sjmisc 2.8.10 2024-05-13 [1] RSPM
splines 4.4.2 2024-10-31 [3] local
stats * 4.4.2 2024-10-31 [3] local
statsExpressions 1.6.1 2024-10-31 [1] RSPM
stringi 1.8.4 2024-05-06 [1] RSPM
stringr 1.5.1 2023-11-14 [1] RSPM
sugrrants 0.2.9 2024-03-12 [1] RSPM
SuppDists 1.1-9.8 2024-09-03 [1] RSPM
systemfonts 1.1.0 2024-05-15 [1] RSPM
tibble 3.2.1 2023-03-20 [1] RSPM
tidyr 1.3.1 2024-01-24 [1] RSPM
tidyselect 1.2.1 2024-03-11 [1] RSPM
timechange 0.3.0 2024-01-18 [1] RSPM
tools 4.4.2 2024-10-31 [3] local
utf8 1.2.4 2023-10-22 [1] RSPM
utils * 4.4.2 2024-10-31 [3] local
uuid 1.2-1 2024-07-29 [1] RSPM
vctrs 0.6.5 2023-12-01 [1] RSPM
withr 3.0.2 2024-10-28 [1] RSPM
xfun 0.49 2024-10-31 [1] RSPM
yaml 2.3.10 2024-07-26 [1] RSPM
zeallot 0.1.0 2018-01-28 [1] RSPM
[1] /home/runner/work/_temp/Library
[2] /opt/R/4.4.2/lib/R/site-library
[3] /opt/R/4.4.2/lib/R/library
* ── Packages attached to the search path.
──────────────────────────────────────────────────────────────────────────────
ggwithinstats()Hypothesis about group differences: repeated measures design
Important
✏️ Defaults
Statistical approaches available
gghistostats()Distribution of a numeric variable
Important
✏️ Defaults
Statistical approaches available
ggdotplotstats()Labeled numeric variable
Important
✏️ Defaults
Statistical approaches available
ggscatterstats()Hypothesis about correlation: Two numeric variables
ggcorrmat()Hypothesis about correlation: Multiple numeric variables
ggpiestats()Hypothesis about composition of categorical variables
ggbarstats()Hypothesis about composition of categorical variables
ggcoefstats()Hypothesis about regression coefficients
Important
✏️ Defaults
Supports all regression models supported in {easystats} ecosystem.
Meta-analysis is also supported!
Iterating over a grouping variable
{ggstatsplot} benefitsNote
| Functions | Description | Parametric | Non-parametric | Robust | Bayesian |
|---|---|---|---|---|---|
ggbetweenstats() |
Between group comparisons | ✅ | ✅ | ✅ | ✅ |
ggwithinstats() |
Within group comparisons | ✅ | ✅ | ✅ | ✅ |
gghistostats(), ggdotplotstats() |
Distribution of a numeric variable | ✅ | ✅ | ✅ | ✅ |
ggcorrmat() |
Correlation matrix | ✅ | ✅ | ✅ | ✅ |
ggscatterstats() |
Correlation between two variables | ✅ | ✅ | ✅ | ✅ |
ggpiestats(), ggbarstats() |
Association between categorical variables | ✅ | NA |
NA |
✅ |
ggpiestats(), ggbarstats() |
Equal proportions for categorical variable levels | ✅ | NA |
NA |
✅ |
ggcoefstats() |
Regression modeling | ✅ | ✅ | ✅ | ✅ |
ggcoefstats() |
Random-effects meta-analysis | ✅ | NA |
✅ | ✅ |
“half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion”
Since the plot and the statistical analysis are yoked together, the chances of making an error in reporting the results are minimized.
No need to worry about updating figures and statistical details separately. 🔗
\(p > 0.05\): The null hypothesis (H0) can’t be rejected
But can it be accepted?! Null Hypothesis Significance Testing 🤫
“In 72% of cases, nonsignificant results were misinterpreted, in that the authors inferred that the effect was absent. A Bayesian reanalysis revealed that fewer than 5% of the nonsignificant findings provided strong evidence (i.e., \(BF_{01} > 10\)) in favor of the null hypothesis over the alternative hypothesis.”
Juxtaposing frequentist and Bayesian statistics for the same analysis helps to properly interpret the null results.
❌ an alternative to learning ggplot2
✅ the more you know ggplot2, the better you can modify the defaults to your liking)
❌ meant to be used in talks/presentations
✅ defaults too complicated for effectively communicating results in time-constrained presentation settings, e.g. conference talks)
❌ only relevant when used in publications
✅ not necessary; can also be useful only during exploratory phase